Week 01: LLM Review

Introduction to Large Language Models and AI in Data Analysis

Published

March 20, 2025

Week 01: LLM Review

Introduction to Large Language Models and their applications in data analysis

Learning Objectives

By the end of this session, students will:

Understand core concepts and architecture behind large language models (LLMs)
Learn how to incorporate AI into data analysis workflows effectively
Distinguish between “Cyborg” and “Centaur” approaches to AI collaboration
Experience the “jagged frontier” of LLM capabilities through hands-on practice
Critically assess capabilities and limitations of AI tools in academic contexts
Develop foundational prompt engineering skills for data analysis tasks

Preparation / Before Class

📚 No Required Reading

This is Week 1 - come ready to explore and discuss!

Reflection Preparation:

Think about your current experience with AI tools (if any)
Consider examples where you’ve encountered AI in your work/studies
Identify one data analysis task you find time-consuming or repetitive

Optional Background:

Ethan Mollick: “Co-Intelligence: Living and Working with AI” (Chapters 1-2)

Class Material

📊 Core LLM Concepts (60 min)

Slideshow: LLM Concepts and Applications

Key Topics Covered:

What are LLMs? Statistical models predicting next tokens from massive training data
The Transformer Revolution: How 2017’s “Attention is All You Need” changed everything
Context Windows: why this matters for data analysis
Training Process: Resources and human feedback
Collaborative Frameworks: How to integrate human and AI work

🔧 Hands-on: FT Graph Challenge (25 min)

The Challenge:

Recreate the Financial Times tariff/yield visualization as accurately as possible using AI assistance.

Note the sophisticated design: multiple data series, annotations, professional styling
Consider: What would you need to recreate this manually vs. with AI assistance?

Learning Approach:

Analyze the original - what data sources, what design elements?
Use AI to find data and generate initial code
Iterate and refine based on results

Discussion Questions

End of Week Reflection:

Personal AI Experience: How have you already incorporated AI into your routine? Which model feels most natural to you?
Error Management: How do you currently deal with AI hallucinations or imperfect answers? What strategies emerged during the FT graph exercise?
The Jagged Frontier: What tasks do you expect AI to excel at? Where do you think it will struggle? Did the visualization exercise match your expectations?

Assignment

Assignment 1: Reproduce the FT Graph

Due: Before Week 2

Choose your approach:

Option A (Standard): Use AI to find data and recreate the visualization with same design principles
Option B (Advanced): Build an interactive dashboard that updates dynamically

Full Assignment Details

Background, Tools and Resources

🤖 Recommended for Data Analysis:

Platform	Strengths	Best For	Cost
ChatGPT 4o/o1	Excellent coding, data analysis tools	R/Python code, statistical methods	$20/month
Claude 4 Sonnet	Great reasoning, document analysis	Research, writing, complex prompts	$20/month
GitHub Copilot	Integrated coding assistance	Real-time code completion	$10/month

Free Tier Capabilities:

Most course tasks achievable with free versions
Paid subscriptions recommended for intensive work – maybe try during this course
Start free, upgrade if you hit limits

Detailed Comparison:

AI Model Selection Guide

Academic Integrity and AI use

Course Philosophy:

AI as Assistant: Use AI to enhance your capabilities, not replace your thinking
Maintain Authority: You remain responsible for all outputs and interpretations
Verify Everything: Always validate AI suggestions, especially statistical claims
Document Usage: Keep track of how AI helped – useful for methodology sections

Red Lines:

Never submit unverified AI output as your own work
Always understand the analysis you’re presenting
Cite AI assistance appropriately (we’ll discuss standards as they evolve)

The Goal:

Become a more capable data analyst who can leverage AI tools effectively while maintaining scientific rigor.

Next Week:

Week 2 - Data Discovery and Documentation where we’ll use AI to understand and document complex datasets.

Some personal comments on AI and this class

I needed to rewrite, edit the slideshow every other month. Bloody hell, this course material is tricky. (While Gosset’s t-test has been around since 1908…)
It was an AI (Claude Sonnet 4.0) that suggest to include the last bit on Academic Integrity and it wrote many lines such as having red lines. Hahh.

--- title: "Week 01: LLM Review" subtitle: "Introduction to Large Language Models and AI in Data Analysis" date: "2025-03-20" --- ::: {.hero-section} ::: {.container} ::: {.hero-title} Week 01: LLM Review ::: ::: {.hero-subtitle} Introduction to Large Language Models and their applications in data analysis ::: ::: ::: --- ![](../images/week1_pic.png) ## Learning Objectives By the end of this session, students will: - Understand core concepts and architecture behind large language models (LLMs) - Learn how to incorporate AI into data analysis workflows effectively - Distinguish between "Cyborg" and "Centaur" approaches to AI collaboration - Experience the "jagged frontier" of LLM capabilities through hands-on practice - Critically assess capabilities and limitations of AI tools in academic contexts - Develop foundational prompt engineering skills for data analysis tasks ## Preparation / Before Class ::: {.week-card .card} ::: {.card-header} 📚 **No Required Reading** ::: ::: {.card-body} **This is Week 1** - come ready to explore and discuss! **Reflection Preparation:** - Think about your current experience with AI tools (if any) - Consider examples where you've encountered AI in your work/studies - Identify one data analysis task you find time-consuming or repetitive **Optional Background:** - Ethan Mollick: "Co-Intelligence: Living and Working with AI" (Chapters 1-2) ::: ::: ## Class Material ::: {.week-card .card} ::: {.card-header} 📊 **Core LLM Concepts (60 min)** ::: ::: {.card-body} **[Slideshow: LLM Concepts and Applications](https://gabors-data-analysis.com/courses/da-w-ai-2025/da-w-ai-01-llm-course.html)** **Key Topics Covered:** - **What are LLMs?** Statistical models predicting next tokens from massive training data - **The Transformer Revolution:** How 2017's "Attention is All You Need" changed everything - **Context Windows:** why this matters for data analysis - **Training Process:** Resources and human feedback - **Collaborative Frameworks:** How to integrate human and AI work ::: ::: ::: {.week-card .card} ::: {.card-header} 🔧 **Hands-on: FT Graph Challenge (25 min)** ::: ::: {.card-body} **The Challenge:** Recreate the [Financial Times tariff/yield visualization](assets/ft-liberation-day-usd-yield-2025-04-11.jpg) as accurately as possible using AI assistance. - Note the sophisticated design: multiple data series, annotations, professional styling - Consider: What would you need to recreate this manually vs. with AI assistance? **Learning Approach:** - Analyze the original - what data sources, what design elements? - Use AI to find data and generate initial code - Iterate and refine based on results ::: ::: ## Discussion Questions **End of Week Reflection:** 1. **Personal AI Experience:** How have you already incorporated AI into your routine? Which model feels most natural to you? 2. **Error Management:** How do you currently deal with AI hallucinations or imperfect answers? What strategies emerged during the FT graph exercise? 3. **The Jagged Frontier:** What tasks do you expect AI to excel at? Where do you think it will struggle? Did the visualization exercise match your expectations? ## Assignment ::: {.callout-note icon="📝"} ## Assignment 1: Reproduce the FT Graph **Due:** Before Week 2 **Choose your approach:** - **Option A (Standard):** Use AI to find data and recreate the visualization with same design principles - **Option B (Advanced):** Build an interactive dashboard that updates dynamically [Full Assignment Details](../assignments/assignment_01.html){.assignment-badge} ::: ## Background, Tools and Resources 🤖 **Recommended for Data Analysis:** | Platform | Strengths | Best For | Cost | |----------|-----------|----------|------| | **ChatGPT 4o/o1** | Excellent coding, data analysis tools | R/Python code, statistical methods | $20/month | | **Claude 4 Sonnet** | Great reasoning, document analysis | Research, writing, complex prompts | $20/month | | **GitHub Copilot** | Integrated coding assistance | Real-time code completion | $10/month | **Free Tier Capabilities:** - Most course tasks achievable with free versions - Paid subscriptions recommended for intensive work -- maybe try during this course - Start free, upgrade if you hit limits **Detailed Comparison:** [AI Model Selection Guide](assets/which-ai.html) ### Academic Integrity and AI use **Course Philosophy:** - **AI as Assistant:** Use AI to enhance your capabilities, not replace your thinking - **Maintain Authority:** You remain responsible for all outputs and interpretations - **Verify Everything:** Always validate AI suggestions, especially statistical claims - **Document Usage:** Keep track of how AI helped -- useful for methodology sections **Red Lines:** - Never submit unverified AI output as your own work - Always understand the analysis you're presenting - Cite AI assistance appropriately (we'll discuss standards as they evolve) **The Goal:** Become a more capable data analyst who can leverage AI tools effectively while maintaining scientific rigor. **Next Week:** [Week 2 - Data Discovery and Documentation](../week02/) where we'll use AI to understand and document complex datasets. ## Some personal comments on AI and this class * I needed to rewrite, edit the slideshow every other month. Bloody hell, this course material is tricky. (While [Gosset's t-test](https://en.wikipedia.org/wiki/Student%27s_t-test) has been around since 1908...) * It was an AI (Claude Sonnet 4.0) that suggest to include the last bit on Academic Integrity and it wrote many lines such as having red lines. Hahh.